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Abstract 

Background: A major concern in conservation genetics is to maintain the genetic diversity of populations. 
Genetic variation in livestock species is threatened by the progressive marginalisation of local breeds in benefit of 
high-output pigs worldwide. We used high-density SNP and re-sequencing data to assess genetic diversity of local 
pig breeds from Europe. In addition, we re-sequenced pigs from commercial breeds to identify potential candidate 
mutations responsible for phenotypic divergence among these groups of breeds. 

Results: Our results point out some local breeds with low genetic diversity, whose genome shows a high 
proportion of regions of homozygosis (>50%) and that harbour a large number of potentially damaging mutations. 
We also observed a high correlation between genetic diversity estimates using high-density SNP data and Next 
Generation Sequencing data (r = 0.96 at individual level). The study of non-synonymous SNPs that were fixed in 
commercial breeds and also in any local breed, but with different allele, revealed 99 non-synonymous SNPs 
affecting 65 genes. Candidate mutations that may underlie differences in the adaptation to the environment were 
exemplified by the genes AZGP1 and TAS2R40. We also observed that highly productive breeds may have lost 
advantageous genotypes within genes involve in immune response - e.g. IL12RB2 and STAB1-, probably as a result 
of strong artificial in the intensive production systems in pig. 

Conclusions: The high correlation between genetic diversity computed with the 60K SNP and whole genome 
re-sequence data indicates that the Porcine 60K SNP Beadchip provides reliable estimates of genomic diversity 
in European pig populations despite the expected bias. Moreover, this analysis gave insights for strategies to the 
genetic characterization of local breeds. The comparison between re-sequenced local pigs and re-sequenced 
commercial pigs made it possible to report candidate mutations to be responsible for phenotypic divergence 
among those groups of breeds. This study highlights the importance of low input breeds as a valuable genetic 
reservoir for the pig production industry. However, the high levels of ROHs, inbreeding and potentially damaging 
mutations emphasize the importance of the genetic characterization of local breeds to preserve their genomic 
variability. 
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Background 

The use of a relatively small number of international high- 
output or commercial breeds largely explains the increase 
in livestock productivity over the past decades. In parallel, 
the number of commercial populations is even decreasing 
due to consolidation of breeding stock and breeding com- 
panies [1]. While high productive breeds may not compete 
with low-input breeds in marginal regions or extensive 
production, FAO has expressed concern due to the shift 
from local breeds to high-output animals [2]. Local breeds 
may be more resistant than high-performance breeds to 
local diseases, may be better adapted to local climate, and 
may be adapted to poorer food quality [2,3]. These charac- 
teristics of local breeds are very relevant for humans living 
in developing countries where local domestic animals are 
an important source of protein. Local breeds are also ap- 
preciated in developed countries for their cultural heritage 
value, and as producers of traditional and high quality 
meat products [4,5]. Increasingly, local heritage breeds are 
recognized for their potential in sustainable or organic 
food production systems. Moreover, they represent a yard- 
stick against which to compare highly selected breeds and 
allowing the detection of genes under selection [6]. Lastly, 
local breeds are claimed to harbour a large amount of the 
variation within livestock species [7,8], and as such are 
recognized as important genetic reservoirs that need to be 
protected for future food security [9]. 

Despite all those inherent properties of local breeds, 
the long term survival of many of them is not assured 
[9]. Inbreeding is particularly relevant in local breeds 
that have low population numbers [5,8]. The loss of gen- 
etic diversity within a breed due to drift and inbreeding 
can have direct consequences for reduction of survival, 
reproduction efficiency and capacity of adaptation to en- 
vironmental changes [10]. The reduction in reproduction 
and growth rates is particularly relevant for local live- 
stock breeds as it can directly lead to economic loss. 
Minimising inbreeding is, therefore, a major goal to 
guarantee the sustainability and maintenance of domes- 
tic populations of livestock species. 

Genetic characterization of livestock breeds by apply- 
ing genetic marker technology is needed to enhance 
breeding and to better direct biodiversity conservation 
strategies. In pigs, the Porcine SNP60 Bead-array [11] is 
a commercially available marker system extensively used 
in genetic studies (e.g. [12,13]). More recently, whole- 
genome re-sequencing has emerged as an economically 
feasible tool for assessing genomic variation among pop- 
ulations [14]. In contrast to the commercially available 
SNP chip, the study of the whole genome sequence pro- 
vides the opportunity of performing unbiased and com- 
prehensive studies to characterize genetic diversity [15], 
regions of homozygosity [16], and scanning the pig gen- 
ome to detect signatures of selection [17,18]. The study 



of entire genomes increases the availability of informa- 
tion on neutral loci, and thereby the accuracy of esti- 
mates of demographically important parameters, such 
as the inbreeding coefficient (F) [19,20]. Next gener- 
ation sequencing (NGS) also allows for direct assess- 
ment of polymorphisms in coding regions that could 
have consequences in selective processes. For instance, 
genes involved in local adaptation, or alleles responsible 
for inbreeding depression can be analysed [19]. 

In this study, we first assess and compare genetic diver- 
sity of low-input breeds from Europe by integrating high- 
density SNP and re-sequencing data. Secondly, we explore 
the role of local breeds as reservoirs for genetic variation 
in a domesticated species. Finally, we assessed differences 
between local and commercial populations in terms of 
functional variation and explore evidences for inbreeding 
in local breeds that could lead to inbreeding depression. 

Results 

We genotyped 12 local breeds from United Kingdom, 
Spain, Italy and Hungary (Table 1) with the Porcine 
SNP60 BeadChip [11]. SNP markers with more than 5% 
missing genotypes were excluded from the analysis. A 
total of 48,641 SNPs that could be mapped to autosomes 
on Sus scrofa build 10.2 [14] were finally used for the 
genetic diversity analysis. In addition, one or two repre- 
sentative genotyped pigs of these breeds, were re- 
sequenced to approximately lOx depth of coverage. The 
number of genomic variants, SNPs, and insertions or de- 
letions (INDELs), varied greatly among the animals stud- 
ied, ranging from 3.10 million in one Large White pig to 
5.77 million in one British Saddleback pig. The number 
of variants and variability within exonic, intergenic, and 
intronic regions in all the re-sequenced animals is shown 
in the Additional file 1. In addition, a re-sequenced African 
Warthog was used as an out-group to deduce ancestral 
or derived status of alleles. Lastly, to characterize the 
distribution of alleles in non-western domestic popula- 
tions, we made comparisons with a panel consisting of 
European and Asian Wild Boar and Chinese pigs. 

Genetic diversity 

To estimate genetic diversity of the populations with 
60K data, we used the expected and observed heterozy- 
gosity (He_60K and Ho_60K) computed with Genepop 
[21]. We also estimated individual inbreeding coefficient 
averaged in each population (F_60K) (Table 2). In 
addition, NGS data was used to calculate heterozygosity 
(h_NGS) [15]. The estimation of h_NGS was performed 
for each pig separately, and, when data from two individ- 
uals were available, the average was used as the estimation 
of h_NGS in the breed. The comparison of genetic diver- 
sity derived from 60K and NGS is shown in Table 2 and 
Figure 1. The study of European local breeds indicated 
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Table 1 Sampling information and analysis performed in each pig population 


Breed 


Code 


Category 


Country 


N 


SNP 


NGS 


British Saddleback 


BS 


Local 


UK 


29 


29 


2 


Gloucester old spots 


GO 


Local 


UK 


33 


33 


2 


Large black 


LB 


Local 


UK 


30 


30 


1 


Middle white 


MW 


Local 


UK 


27 


27 


2 


Tamworth 


TA 


Local 


UK 


30 


30 


2 


Chato Murciano 


CM 


Local 


Spain 


46 


46 


2 


Iberian pig 


IB 


Local 


Spain 


29 


29 


2 


Cinta Senese 


CS 


Local 


Italy 


13 


13 


1 


Casertana 


CT 


Local 


Italy 


15 


15 


2 


Nera Siciliana 


NS 


Local 


Italy 


15 


15 


0 


Calabrese 


CA 


Local 


Italy 


15 


15 


1 


Mangalica 


MA 


Local 


Hungary 


25 


0 


2 


Duroc 


DU 


Commercial 


International 


2 


0 


2 


Large white 


LW 


Commercial 


International 


2 


0 


2 


Landrace 


LR 


Commercial 


International 


2 


0 


2 


Pietrain 


PI 


Commercial 


International 


2 


0 


2 


Warthog 




Wild 




2 


0 


2 


Wild boar 


WB 


Wild 


China 


3 


0 


3 


Wild boar 


WB 


Wild 


The Netherlands 


2 


0 


2 



that Mangalica has the lowest genetic diversity 
(He_60K = 0.19; h_NGS = 7.58E-04) and British Saddle- 
back the highest (He_60K = 0.29; h_NGS = 2.16E-03). 
The two marker systems also agreed in the low genomic 
variability of Cinta Senese breed (He_60K = 0.20, 
h_NGS = 1.14E-03), high variability in Chato Murciano 
and Middle White (He_60K = 0.28-0.27; h_NGS = 1.87E- 
03-1.81E-03 respectively) and intermediate levels for 
Calabrese (He_60K = 0.24; h_NGS = 1.62E-03). Minor 
disagreements between the genotyping methods were 
observed in Iberian breed, with a lower estimate of gen- 
etic diversity based on NGS than on 60K data. In the 
English breeds Tamworth and Gloucester Old Spots the 
genetic diversity was low according to the 60K data 
(He_60K<0.21) but intermediate based on the NGS 
data (h_NGS ~ 1.45E-03). We observed a proportionally 
higher diversity in Casertana breed when 60K data was 
used at population level (Figure 1A). However, such dis- 
agreement between NGS and 60K was not observed at 
individual level (Figure IB). This is explained by the ex- 
istence of five Casertana pigs with negative inbreeding 
coefficient (F) values (see Additional file 2) that were 
analysed with 60K but not with NGS data. These Case- 
rtana pigs may have been recently crossed with other 
pigs resulting in an increased, but misleading, diversity 
to the overall population estimates using 60K data. 

The study of parameters at the individual level — 
F_60K and h_NGS — allows a direct comparison 



between genetic diversity using the two marker systems 
(Figure IB). In order to further assess the ascertainment 
bias, commercial and Asian pigs were included since the 
former may suffer less bias whereas Asian pigs are ex- 
pected to have high ascertainment bias [14]. Not unex- 
pectedly then, the major disagreement between 60K and 
NGS data along all the populations was found in the 
Asian pigs whose genetic diversity was largely underesti- 
mated by the 60K data (Figure IB; Table 2). Apart from 
Asian pigs, we observed that English and commercial 
pigs tended to have higher genetic diversity in the esti- 
mates based on NGS than in 60K relative to the fitted 
line (Figure IB). In contrast, pigs from Italy, Hungary 
and Spain showed lower than estimated genetic diversity 
based on NGS relative to the 60K SNP data. Despite 
these systematic deviation of the fitted model, the Pearson s 
correlation coefficient computed using European pigs 
(both local and commercial pigs) was high and signifi- 
cant between Ho_60K and h_NGS (0.89, P < 0.01), and 
between He_60K and h_NGS (0.84, P < 0.01) at popula- 
tion level. A very high correlation between h_NGS and 
F_60K was observed when local pigs were analysed at 
individual level (-0.96, P < 0.01). The inclusion of the 
five Asian pigs in the analysis resulted in non-significant 
correlations lower than 0.2. 

The number of Runs of Homozygosity (ROH) as well 
as their length varied greatly among populations as esti- 
mated from both 60K and NGS. In agreement with the 
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Table 2 Genetic diversity parameters using Porcine 
SNP60 Beadchip (SNP) and Next generation sequence 
data (NGS) 



Continent 


Category 


Population 


Ho 


SNP 
He 


F 


NGS 
Her 


Europe 


Local 


BS 


0.28 


0.29 


0.13 


2.16E-03 


Europe 


Local 


CA 


0.27 


0.24 


0.17 


1 .62E-03 


Europe 


Local 


CM 


0.26 


0.28 


0.17 


1 .87E-03 


Europe 


Local 


CS 


0.19 


0.2 


0.41 


1.14E-03 


Europe 


Local 


CT 


0.26 


0.27 


0.21 


1 .27E-03 


Europe 


Local 


GO 


0.21 


0.21 


0.34 


1.45E-03 


Europe 


Local 


IB 


0.21 


0.23 


0.33 


1 .34E-03 


Europe 


Local 


LB 


0.25 


0.25 


0.23 


1 .86E-03 


Europe 


Local 


MA 


0.15 


0.19 


0.55 


7.58E-04 


Europe 


Local 


MW 


0.27 


0.27 


0.16 


1.81E-03 


Europe 


Local 


TA 


0.2 


0.2 


0.38 


1.45E-03 


Europe 


Commercial 


DU 


0.26 


0.27 


0.29 


1 .63E-03 


Europe 


Commercial 


LR 


0.31 


0.32 


0.16 


2.07E-03 


Europe 


Commercial 


LW 


0.3 


0.31 


0.19 


1 .82E-03 


Europe 


Commercial 


PI 


0.31 


0.3 


0.16 


2.09E-03 


Europe 


Wild 


WB_NL 


0.17 


0.19 


0.55 


1.01E-03 


Asia 


Wild 


WB_NCH 


0.17 


0.18 


0.53 


2.96E-03 


Asia 


Wild 


WB_SCH 


0.21 


0.22 


0.45 


3.49E-03 


Asia 


Local 


MS 


0.17 


0.17 


0.53 


2.54E-03 



Ho: Observed heterozygosity; He: expected heterozygosity; F: inbreeding 
coefficient; Het: Heterozygosity estimated using NGS data [15]. 



genetic diversity estimates, all the analyses showed that 
the Mangalica breed had the highest proportion of the 
genome covered by ROH (Figure 2). The Italian breeds 
Casertana and Cinta Senese and the English breeds Tarn- 
worth and Gloucester Old Spots also had a high cover- 
age of ROH (50-55% using NGS data). At the other end 
of the spectrum, the breed British Saddleback showed 
the lowest proportion (35%) followed by Calabrese and 
Chato Murciano (-40%). A high correlation between es- 
timates of ROH was observed between estimates derived 
from NGS and 60K SNP data, although the 60K SNP 
data consistently underestimated the proportion of the 
genome covered by ROH (Figure 2). The comparison 
between the number and length of ROH using 60K and 
NGS revealed that 60K data tended to not discover short 
ROH and to overestimate the length of long ROH 
(Additional file 3). The correlation between length of 
ROH, estimated with NGS data, and the genetic diversity 
estimates F_60K and h_NGS was 0.79 and 0.84 respect- 
ively. The comparison of F value against the total length 
of ROH in the populations Calabrese, Chato Murciano, 
Casertana and Middle White encompassed pigs with a 
pattern of negative F values as well as shorter and lower 
number of ROH (Additional file 2). 



Functional significance of non-synonymous variants 

Of all the SNPs discovered by NGS, an average of 0.17% 
was annotated as non-synonymous variants (Additional 
file 1). Considering all individuals, we observed a total 
of 16,409 different non-synonymous SNPs. All non- 
synonymous SNPs were analysed with Polyphen2 [22], 
that classifies mutations as benign and possible/prob- 
ably damaging. In agreement with the genetic diversity 
estimations a high number of potentially damaging mu- 
tations is fixed in the breeds Mangalica, Cinta Senese, 
Tamworth and Gloucester Old Spots (Additional file 4). 
A phylogenetic tree of local breeds based on 16,409 
non-synonymous SNPs resulted to be highly similar to 
the tree computed with 60K SNP data (Additional file 5). 
All English breeds clustered together and differentiated 
of the other European populations, which may reflect 
similarities in their demographic history. Calabrese and 
Chato Murciano breeds occupied an intermediate pos- 
ition between no introgressed European pigs and Eng- 
lish breeds as a result of indirect Asian introgression 
from English and/or commercial pigs. 

In order to find SNPs that potentially explain pheno- 
typic differences between local populations and high- 
output pigs, we extracted all possible non-synonymous 
SNPs and we computed F st . Eight pigs derived from 
commercial elite lines (Duroc, Large White, Landrace 
and Pietrain) were considered as one population and 
each local breed was used separately to determine F st . 
We focussed on those non-synonymous SNPs that were 
fixed in commercial breeds and also in any local breed 
but with different allele, i.e. Fst = 1. Moreover, we ex- 
plored the occurrence of ROH and published QTL 
overlapping these SNPs. 

This analysis revealed 99 non-synonymous SNPs with 
different fixed alleles in commercial and at least one of 
the local breeds, affecting 65 genes (Additional file 6). 
The comparison with a Warthog pig revealed that in 64% 
of fixed alleles it was the derived allele that was fixed in 
local pigs and 36% in commercial pigs. Among these 65 
genes, we focused on those (i) with the two alleles -the 
ancestral and the derived- present in wild populations, 
(iii) those that were affected by several fixed SNPs and (iv) 
with a mutation classified by Polyphen2 (Additional file 6; 
Figure 3). 

We observed a possible damaging mutation in the 
gene AZGP1 in the breeds Mangalica, Cinta Senese and 
Gloucester Old Spot, as well as in European wild boar. 
This mutation overlaps with QTLs related with the 
number of vertebra, abdominal fat and ear morphology. 
It occupied a 50 kb genomic region where genetic diver- 
sity varied greatly among populations -from 0 to 5 times 
the averaged genetic diversity in the pig-. We observed 
two fixed SNPs within the gene IL12RB2, with Gloucester 
Old Spots, Middle White, Tamworth, Calabrese carrying 
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Figure 1 Comparison of genetic diversity estimated with NGS and 60K-SNP data. (A) Heterozygosity (Het) with NGS Vs. Observed 
heterozygosity (Ho) using 60K data at population level in local breeds. Each dot represents the average value in the populations. The size of the 
dots are proportional to the inbreeding coefficient (F) observed in the population. (B) Heterozygosity (Het) with NGS Vs. Inbreeding coefficient (F) 
using 60K data at individual level. Each dot represents a single pig. The size of the dots is proportional to the Ho using 60K at population level. 
The line that best fit the estimates in European pigs is displayed. The lack of correlation observed in Asian pigs indicates high ascertainment bias. 



the two ancestral alleles. Other local breeds such as 
British Saddleback and Chato Murciano were heterozy- 
gous at this locus, as were European and Asian wild 
pigs. This genomic region overlaps with meat and 
carcass quality QTLs such as back fat thickness and 



intramuscular fat content and the production QTLs for 
average daily gain and body weight. It also overlaps with 
ROH or low genetic diversity regions, except in British 
Saddleback and Large Black. A mutation classified as 
benign was observed within the gene STAB1. This gene 
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Figure 2 Comparative analysis of the percentage of the genome covered by ROH in each breed. Estimations using NGS are represented in 
blue and estimations using 60K data in red. 



codes for a protein involved in defence against bacterial 
infection by binding to bacteria and inducing phagocytic 
activity [25-27]. The allele was present in English 
breeds, Casertana and Asian pigs. STAB1 overlaps with 
four QTLs related with CD4 and CD8 leukocyte per- 
centage and ratio. The genetic diversity in this region is 
low, especially in commercial breeds with seven out of 
eight commercial breeds overlapping with a ROH. The 
two animals of the breed Mangalica, Chato Murciano 
and several English pigs were all homozygous for three 
derived alleles within the gene EIF2AK3 while wild pigs 
had only one. The protein coded by this gene is involved 
in skeletal system development. The gene overlaps with 
QTL for feet and leg conformation and Osteochondro- 
sis score. Local pigs carrying the derived allele have a 
ROH or a low genetic diversity in the 50 kb region over- 
lapping this gene. 

It must be considered that the large size of some 
QTLs and ROHs could lead into random associations 
with the SNPs under study. Therefore, we tested 
whether the patterns of overlapping between non- 
synonymous SNPs and QTLs were significantly different 
from random by a permutation test using 1,000 resam- 
ples (Additional file 7). The analysis showed a non- 
random overlapping between non-synonymous SNPs 
and exterior QTLs for ear weigh, area and size (P < 
0.002) and for leg conformation (P < 0.007) as well as 
for average daily gain and body weight (P < 0.01) which 



are categorized as production QTLs. The QTLs related 
to leukocyte number were also significantly overrepre- 
sented in the analysis (P < 0.008). On the other hand, we 
are not able to discard a random association between 
SNPs and QTLs within the category meat and carcass 
quality and vertebra number (P > 0.05). 

Discussion 

The advances in sequencing technologies now allows se- 
quencing whole genomes in multiple individuals [19,28]. 
However, the cost of this technology is still high, and 
budgets for conservation genetics research are limited. 
While high-density SNP panels allow the study of a rep- 
resentative sample size of a population at a much lower 
cost, there is a concern regarding the ascertainment bias 
implicit in the use of SNP chips [29]. This concern is 
even higher for local pig populations since they were not 
considered in the design the Porcine SNP60 Beadchip 
[11]. In this study, we found a high correlation between 
diversity estimates derived from the Illumina porcine 
60SNP Beadchip and NGS data when local European 
breeds were analyzed. These results indicate that the 
Illumina porcine 60SNP Beadchip provides reliable esti- 
mates of genomic diversity for comparative studies be- 
tween European populations, despite the expected bias. 
Nevertheless, English breeds showed greater diversity 
with NGS compared to 60K data than expected com- 
pared to expected values derived from all populations 
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Figure 3 Chromosomes 3, 6 and 18 are arranged circularly end-to-end using Circos [23]. From inside to outside, the four inner rings 
display ROH (green and blue bars) and genetic diversity (red histograms) in Large White, Landrace, Mangalica and Tamworth respectively. Some 
QTLs overlapping any of the four genes studied are represented in yellow (QTL1: Abdominal fat weight; QTL2: Osteochondrosis score; QTL3: 
Intramuscular fat content; QTL4: Backfat thickness; QTL5: Feet and leg conformation; QTL6: Vertebra number. The outer ring represents the averaged 
high-density recombination map described by Tortereau et al. [24]. 



combined. These results may highlight the influence of 
historical breeding practices, whereby Asian pigs were 
used to improve local English pigs during the late 18 th 
and 19 th century [14,30]. Despite the additional diversity 
found in English pigs owing to Asian introgression, 
some English pigs display high levels of ROH and poten- 
tially damaging mutations as a the result of recent in- 
breeding and could indicate that these breeds are prone to 
inbreeding depression. 

SNP variants were annotated and potential deleterious 
effects were predicted with Polyphen2. Recessive deleteri- 
ous alleles can be a major cause of inbreeding depression 
in populations with low genetic diversity [31]. In our study 
we find the largest number of putative deleterious muta- 
tions in those animals that also have the highest percent- 
age of the genome covered by ROH and the lowest 
genetic diversity, i.e. Mangalica and Cinta Senese breeds, 
and in the breeds Tamworth and Gloucester Old Spots. 
Genomic diversity in these breeds was lower than almost 
all domestic and wild populations from Europe and Asia 
[16] corroborating the hypothesis that damaging muta- 
tions can accumulate, due to drift, in populations with 



high levels of inbreeding. A similar relation between gen- 
etic diversity and proportion of deleterious alleles has been 
described in human populations [32] and is thought to be 
caused by a less effective purifying selection as effective 
size decreases. This finding points out the need to develop 
conservation programs for endangered livestock popula- 
tions that are very prone to high levels of inbreeding. 

We found non-synonymous, high allele frequency dif- 
ferences (fixed for different alleles) at non-synonymous 
sites to be overrepresented in genes involved in immune 
response, anatomical development, behaviour, and sen- 
sory perception between commercial and local popula- 
tions. Local breeds tend to be reared in traditional 
systems without being subjected to intense artificial se- 
lection (e.g. BLUP, GBLUP selection) as applied to com- 
mercial pig populations. As a result of years of different 
selection pressures and environments, genomic varia- 
tions underlying phenotypic differences can be expected. 
We have specifically focussed on non-synonymous vari- 
ants because they will alter the amino acid sequence of 
gene products, which may result in different phenotypes 
[33]. Although phenotypic change is expected to a large 
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extent to result from regulation of genes, rather than dif- 
ferences in amino acid sequences, regulatory important 
variations are currently difficult to predict reliably and 
were therefore not considered in this study. 

The gene AZGP1 stimulates lipid degradation in adipo- 
cytes and subsequently is considered a lipid-mobilizing 
factor [34]. This gene is linked with obesity in humans 
and its expression is inversely associated with body weight 
and percentage of body fat in mice and humans [35,36]. 
In pigs, a 20 Mb QTL in chromosome 3 [37] for abdom- 
inal fat weight overlaps this gene. Mangalica, Cinta Senese 
and one European wild boar are homozygous for a derived 
allele annotated as probably damaging. This allele is ab- 
sent in commercial pigs and also in some local pigs. The 
inferred status of the allele as probably damaging' may, 
for pig, rather result in having a large effect on the pheno- 
type. Whereas pigs used to be bred for high fat deposition, 
in modern pig production systems lean meat is desired. 
AZGP1 also overlaps with a 16 Mb QTL for ear size, area 
and weight [38] and a 8.5 Mb QTL for vertebra number 
[39]. Ear morphology traits have been traditionally used to 
define breed standards. We observed a non-random over- 
representation of non-synonymous SNPs overlapping with 
QTLs related with ear morphology. This is in agreement 
with Wilkinson et al. [6] who found signatures of diversi- 
fying selection between pig breeds from Europe in gen- 
omic regions associated with ear morphology. Related to 
vertebra number, we found a fixed non-synonymous mu- 
tation in the Mangalica and heterozygous genotype in 
Iberian and Casertana breeds within the gene PLAG1 that 
has been related with stature in humans and cattle [40,41]. 
Rubin et al. [18] concluded a strong signature of selection 
in the domestic pig genome at PLAG1. These data suggest 
that the mutations found in the genes AZGP1 and PLAG1 
may represent signatures of different selection pressures 
between local breeds as Mangalica and commercial pigs. 
Another compelling example of potential differential 
selection between commercial and local populations is 
represented by the two mutations found in the bitter taste 
receptor TAS2R40. The high variability within the family 
of taste receptor genes has been suggested a consequence 
of adaptation of populations to specific dietary repertoires 
and environment [42], such as prevention of consumption 
of plant toxins [43]. 

It has been observed that selection for economically 
important traits tends to increase the susceptibility to en- 
vironmental factors [44,45]. In our study, ancestral muta- 
tions classified as benign in genes involved in immune 
related genes such as IL12RB2 and STAB1, were observed 
in several local pigs. The IL12RB2 subunit plays an im- 
portant role in Thl cell differentiation that is critical for 
an effective immune response against different types of 
pathogens [46]. The three mutations observed in this gene 
overlap with important QTLs in pig production such as 



back fat thickness and intramuscular fat content [47,48]. 
The fact that mutations in IL12RB2 can lead to a defective 
IFN-gamma response to microorganisms [49,50], suggests 
that disadvantageous genotypes could have been main- 
tained in commercial populations. 

The EIF2AK3 gene overlaps with QTLs for osteochon- 
drosis score [51] and feet and leg conformation [52]. 
Moreover, the permutation test using all the non- 
synonymous SNPs showed non- random overrepresenta- 
tion of SNPs overlapping with QTLs for leg conform- 
ation. Interestingly, this gene encompasses functions of 
bone mineralization, chondrocyte development insulin 
secretion and fat cell differentiation and has being re- 
lated with the Wolcott-Rallison syndrome in humans 
[53]. Leg weakness is a major concern in growing pigs 
raised under modern production systems and osteo- 
chondrosis is considered to be the primary cause of this 
syndrome. Indeed, forced selection for high growth cap- 
acity predisposes to these disorders due to an imbalance 
between the development of the skeletal system and 
muscle [54]. The allelic differences between local and 
commercial pigs within the EIF2AK3 gene could under- 
lie strong directional selection in commercial breeds. 
The fact that the same alleles are segregating in both 
wild boar and low-input breeds supports this hypothesis. 

The genes discussed above had different fixed alleles for 
non-synonymous SNPs between commercial and local 
pigs. The presence of both alleles, the ancestral and the 
derived, in wild boars indicates that the variation was 
present before domestication. While differences in allele 
frequencies of SNPs in genes such as AZGP1 and 
TAS2R40 may underlie a rapid adaptation to different 
environments, it can also occur due to drift effects in 
small populations in the absence of selection, or even if 
the allele is in fact disadvantageous. The fixed alleles in 
EIF2AK3 and IL12RB2 could potentially result in disad- 
vantageous phenotypes in high-output breeds owing to 
the strong artificial selection for production traits. We 
demonstrated that genetic variability found in wild pop- 
ulations is also being preserved in local breeds at gen- 
omic sites with potential phenotypic effect. This further 
highlights the importance of preserving local breeds as 
a source of genomic diversity that could be used in 
future selection programs of commercial pigs. However, 
the results presented also highlight high levels of ROHs, 
inbreeding and potentially damaging mutations that 
threat the future of local pig breeds, emphasizing the 
need of implementing conservation programmes to 
preserve the genomic variability of low-input breeds. 

Conclusions 

In this study, we assessed genetic diversity of low-input 
breeds from different European regions by integrating 
high-density SNP and re-sequencing data. The 



Herrero-Medrano et al. BMC Genomics 2014, 15:601 
http://www.biomedcentral.eom/1 471 -21 64/1 5/601 



Page 9 of 12 



comparison of the two marker system estimations 
provided insights for strategies to the genetic 
characterization of local breeds. Furthermore, the re- 
sequenced local pigs were compared with re-sequenced 
commercial pigs to report candidate mutations respon- 
sible for phenotypic divergence among those groups of 
breeds. We observed that local pig breeds are an import- 
ant source of genomic variation within-species, and 
thereby, they represent a genomic stock that could be 
important for future adaptation to long-term changes in 
the environment or consumers preferences. However, 
high levels of inbreeding threaten the long term survival 
of some of the local breeds studied. 

Methods 

Animals and sampling and SNP genotyping 

Blood samples from 315 unrelated domestic pigs were 
collected and DNA was extracted by using the QIAamp 
DNA blood spin kit (Qiagen Sciences). The study in- 
cluded domestic pigs that belonged to 12 local breeds 
from England, Spain, Italy and Hungary. Samples were 
genotyped using the Illumina Porcine 60K iSelect Bead- 
chip [11] per manufacturers protocols. We included only 
SNPs mapped to one of the 18 autosomes on Sus scrofa 
build 10.2 and that had less than 5% missing genotypes. 
In addition, 1-2 animals of each local breed were se- 
lected for re-sequencing with the exception of the Nera 
Siciliana breed. We also re-sequenced eight individuals 
that belonged to the commercial, international pig 
breeds Duroc, Large White, Landrace and Pietrain. The 
samples used are detailed in Table 1. 

Ethics statement 

DNA samples obtained from Chato Murciano pigs were 
obtained from blood samples collected by veterinarians. 
This procedure was approved by the Murcia University 
Ethics Committee and with the consent of the farmers. 
All the other samples were collected in the framework 
of the PigBioDivl and PigBioDiv2 projects. These DNA 
samples were obtained from blood samples collected by 
veterinarians according to national legislation, from tis- 
sue samples from animals obtained from the slaughter- 
house or, in the case of wild boar, from animals culled 
within wildlife management programs. 

Sequencing alignment and SNP discovery 

Library construction and re-sequencing of the samples 
was performed using 1-3 ug of genomic DNA following 
the Illumina library prepping protocols (Illumina Inc.). 
The library insert size ranged for 300-500 bp and frag- 
ments were sequenced from both sides yielding two 
times 100 bp mated sequences. Short read alignment 
was done against the Sus scrofa genome, build 10.2 [14] 
using Mosaik. The pigs were sequenced to a depth of 



approximately lOx. Further details on sequence mapping 
can be found in [16]. 

Archives in BAM format generated with the Mosaik 
Text function were used for the SNPs calling against the 
Sus scrofa genome, build 10.2. The mpileup function im- 
plemented in SAMtools vl.4-r985 [55] was used to ob- 
tain variant calls. Variations were filtered for a minimum 
genotype SNP and INDEL quality (20 and 50 respect- 
ively). Only variations based on a coverage in the range 
of 5x until twice the genome average were considered. 

Data analysis using high-density SNP genotyping 

We used Genepop 4.2 [21] to compute the expected and 
observed heterozygosity. Inbreeding coefficient was cal- 
culated for all the individuals using PLINK 1.07 [56]. 
The ROHs were defined with PLINK 1.07 as regions of a 
minimum size of 10 kbp and encompassing 20 homozy- 
gous genomic sites, while allowing one heterozygous SNP. 
We predefined a minimum SNP density of 1 SNP/Mb and 
a largest possible gap between SNPs of 1 Mb to assure 
that the ROHs were not severely affected by the SNP 
density. Finally, we computed the Pearsons correlation 
coefficient between length of ROHs and genetic diversity 
parameters in each breeds using R (www.r-project.org). 

Data analysis using NGS data 

Heterozygosity was estimated for each individual as the 
number of heterozygous sites per 50 Kb-bin, corrected 
for total number of sites per bin [15]. Only bins that 
were sufficiently covered (per base at least a sequence 
depth of 7x and maximum of approximately 2 x average 
coverage) were considered. We obtained the heterozy- 
gosity for the population by averaging the individual 
heterozygosity of all individuals that belonged to that 
population. Correlations between 60K and NGS gen- 
omic diversity estimates were calculated using Pearsons 
correlations in R environment. Graphics were obtained 
using the plotting system ggplot2 for R. 

To estimate the ROH from re-sequencing data, we 
followed the procedure implemented by Bosse et el. [16], 
using a 100 kb sliding window. ROH were defined as a gen- 
omic region of at least 10 kb where the number of SNPs in 
an individual is less than expected based on the genomic 
average. Briefly, if the number of SNPs per bin = <0.25 x 
the genomic average, and if 10 or more consecutive bins 
showed a total SNP average lower than the total genomic 
average, they were extracted as candidates ROH. 

ANNOVAR [57] was used to obtain the functional anno- 
tation (non-synonymous, synonymous, stop codon gain/ 
loss, amino acid changes) of the genomic variants in each 
animal based on the pig reference genome (Swine Genome 
Sequencing Consortium Sscrofal0.2) obtained from the 
UCSC database (http://genome.ucsc.edu). For further ana- 
lysis, only the non-synonymous sites were considered. The 
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genes that overlap with the non-synonymous mutations 
were retrieved using Biomart [58]. 

The F st value for all non-synonymous mutations was 
calculated using Genepop 4.2 [21]. For this analysis all 
the commercial pigs were considered as a single popula- 
tion while each local breed was considered separately. 
To reduce the number of SNPs to those that most likely 
represent the genetic basis of the phenotypic differences 
between commercial and local breeds, we only included in 
the study SNPs with F st = 1 between the groups (i.e. fixed 
differences). Moreover, in order to avoid false positives, 
we exclusively considered those mutations that were 
homozygous in at least the two animals of the local 
breed. In the case of the local breeds that had only one 
animal re-sequenced or when one of the two animals of 
the breed showed missing data, the SNP was not consid- 
ered for the functional analysis regardless its F st value. 
Those SNPs with missing data in more than three com- 
mercial pigs were equally excluded. 

The sequence of a re-sequenced Warthog was used to 
ascertain the alleles as ancestral or derived. The genotypes 
for those SNPs were also obtained from re-sequenced 
data from two domestic Meishan pigs, one wild boar 
from South China and two from North China and two 
European wild boars. The sequencing alignment and 
SNP discovery of these samples was the same as previ- 
ously detailed. 

Finally, we used the Polymorphism Phenotyping (Poly- 
Phen2) algorithm [22] to predict phenotypic consequences 
of the non-synonymous sites. PolyPhen2 predicts whether 
a SNP is 'benign , possibly damaging' or probably dam- 
aging' on the basis of evolutionary conservation, structure 
and sequence information. 

Availability of supporting data 

The data sets supporting the results of this article are 
included within the article (and its additional files). 
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Additional file 1: Number of genomic variants within exonic, 
intergenic, and intronic regions. 

Additional file 2: Inbreeding coefficient Vs. Length of ROH using 
60K data. Each dot represents an individual and the size of the dots are 
proportional to number of ROH carried by the pig. The black line 
highlight the F = 0.00 value. 

Additional file 3: Example of ROH estimated with 60K and NGS in 
chromosomes SSC1 and SSC13. The two lines of the same color 
represent the same animal, with the clearer color representing 60K 
estimation and the darker NGS results. The lack of detection of short ROH 
using 60K as well as overestimation of the length of long ROH is observed. 
From out to inside the circle: MA (orange), CT (green), TA (red), BS (yellow). 

Additional file 4: Detail of the mutations classified by Polyphen2. 

Fixed mutations in local breeds with available Polyphen2 classification. 
Summary of the total number of potentially damaging mutations, total 



number of benign mutations and percentage of damaging mutations in 
each breed. 

Additional file 5: Dendograms of the tested breeds based on 
pairwise F st values using 60K and NGS data. Dendrogram based on 
F st pairwise between local breeds using 60K data; Dendrogram based on 
Fst pairwise between using 16.409 non-synonymous sites. 

Additional file 6: SNPs in coding sequence with extreme differences 
in allele frequencies (F st = 1) between commercial and local pig 
populations. 

Additional file 7: Permutation analysis to test if patterns of 
overlapping between non-synonymous SNPs and QTLs were 
significantly different from random. 
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